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ABSTRACT Viruses have a profound influence on the ecology and evolution of plankton, but our understanding of the composi- 
tion of the aquatic viral communities is still rudimentary. This is especially true of those viruses having RNA genomes. The lim- 
ited data that have been published suggest that the RNA virioplankton is dominated by viruses with positive-sense, single- 
stranded (+ss) genomes that have features in common with those of eukaryote-infecting viruses in the order Picornavirales 
(picornavirads). In this study, we investigated the diversity of the RNA virus assemblages in tropical coastal seawater samples 
using targeted PCR and metagenomics. Amplification of RNA-dependent RNA polymerase (RdRp) genes from fractions of a 
buoyant density gradient suggested that the distribution of two major subclades of the marine picornavirads was largely congru- 
ent with the distribution of total virus-like RNA, a finding consistent with their proposed dominance. Analyses of the RdRp se- 
quences in the library revealed the presence of many diverse phylotypes, most of which were related only distantly to those of 
cultivated viruses. Phylogenetic analysis suggests that there were hundreds of unique picornavirad-like phylotypes in one 35- 
liter sample that differed from one another by at least as much as the differences among currently recognized species. Assembly 
of the sequences in the metagenome resulted in the reconstruction of six essentially complete viral genomes that had features 
similar to viruses in the families Bacillarna-, Dicistro-, and Marnaviridae. Comparison of the tropical seawater metagenomes 
with those from other habitats suggests that + ssRNA viruses are generally the most common types of RNA viruses in aquatic 
environments, but biases in library preparation remain a possible explanation for this observation. 

IMPORTANCE Marine plankton account for much of the photosynthesis and respiration on our planet, and they influence the cy- 
cling of carbon and the distribution of nutrients on a global scale. Despite the fundamental importance of viruses to plankton 
ecology and evolution, most of the viruses in the sea, and the identities of their hosts, are unknown. This report is one of very few 
that delves into the genetic diversity within RNA-containing viruses in the ocean. The data expand the known range of viral di- 
versity and shed new light on the physical properties and genetic composition of RNA viruses in the ocean. 
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Viruses are integral to life in the ocean, contributing to the 
disease and mortality of their hosts, catalyzing evolution by 
mediating gene exchange, and influencing the partitioning of nu- 
trients among trophic levels (1). The genetic material of viruses 
may consist of single-stranded DNA (ssDNA) or double-stranded 
DNA (dsDNA) or ssRNA or dsRNA, depending on the virus. Most 
studies of marine virioplankton over the past two decades have 
focused on DNA-containing viruses, which appear to be predom- 
inantly bacteriophages (2). Our knowledge about the RNA viruses 
in the marine virioplankton is much more limited, but the data 
available suggest that virtually all of them infect eukaryotic organ- 
isms, most likely protists (3). These RNA viruses were often as- 
sumed to be a minor component of the virioplankton, but recent 
data suggest that, at least at the one location sampled, they were as 
abundant in seawater as DNA viruses (4). 

At present, our knowledge of the RNA viruses that infect the 



marine protistan plankton is limited to what we have learned from 
1 3 isolates and the results of a few molecular surveys ( 3 ) . The RNA 
viruses that have been isolated so far infect some of the major taxa 
of marine protists, including diatoms (5-9), dinoflagellates (10), 
raphidophytes (11), prasinophytes (12), and thraustochytrids 
(13). Phylogenies of viruses in the order Picornavirales based on 
alignments of the RNA-dependent RNA polymerase (RdRp) se- 
quences are congruent with the established taxonomic assign- 
ments by the International Committee on Taxonomy of Viruses 
(ICTV) (14-16), and thus the RdRp is a useful molecular marker 
for the investigation of the diversity of viruses in the order Picor- 
navirales (picornavirads). Cultivation-independent surveys tar- 
geting the RdRp of picornavirads in samples from temperate wa- 
ters (17) and subtropical waters (18) have revealed a level of 
genetic diversity that is poorly represented by the limited number 
of existing cultures (19). 
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Metagenomic methods can provide a more comprehensive 
view of the genetic diversity of RNA viruses than single-gene sur- 
veys. This approach has been used to investigate RNA virus diver- 
sity in a variety of habitats, including reclaimed water (20), un- 
treated wastewater (21, 22), hot springs (23), a freshwater lake 
(24), and various marine habitats (4, 25, 26). Application of this 
method to RNA viruses harvested from coastal waters of British 
Columbia suggested that most were predominantly positive- 
sense, single-stranded RNA ( + ssRNA) viruses that are distantly 
related to established taxa (25). Very few dsRNA viral sequences 
were identified, and no sequences from RNA phage, negative- 
sense, single-stranded RNA (— ssRNA) viruses, or retroviruses 
were detected (25). These data offered a first glimpse into the 
genomic diversity of the natural RNA virioplankton in seawater 
but were limited to a single location in temperate coastal waters. 
To broaden our understanding of RNA viral ecology in the ocean, 
we estimated the abundance of RNA viruses relative to that of 
DNA viruses and the diversity of RNA viral communities har- 
vested from coastal tropical waters. Because of their small genome 
sizes, RNA viruses cannot be accurately quantified in seawater 
even with RNA-specific stains (27, 28); therefore, we used an in- 
direct approach in which the values corresponding to the relative 
masses of total viral RNA and DNA were divided by estimates of 
the mass of nucleic acid per RNA or DNA virion in the sample to 
determine relative abundances. Our data, from a tropical coastal 
site sampled on two occasions, indicated that the abundance of 
RNA viruses could at times exceed that of DNA viruses in sea- 
water (4). An initial metagenomic analysis of RNA virus diversity 
in those samples suggested that, just as they did in one study in 
temperate waters (25), +ssRNA viruses in the order Picornavirales 
dominated in tropical coastal waters. The goal of our previous 
report was to estimate the relative abundances of RNA and DNA 
viruses. In this report, we expand our analysis of the two Kane' ohe 
Bay viromes. Specifically, we present data on the buoyant density 
distributions of marine picornavirads, provide a more detailed 
analysis of the RNA virus metagenomes — including the recon- 
struction of six genomes — and provide estimates of the diversity 
of marine picornavirads using a number of different approaches. 

RESULTS AND DISCUSSION 

General description of the metagenomes. A general characteriza- 
tion of the two metagenomes analyzed in this study can be found 
elsewhere (4), but the salient features are summarized here to 
provide a context for the new analyses that are the focus of this 
paper. Pyrosequencing of the two libraries prepared from samples 
collected in 2009 and 2010 from coastal O'ahu resulted in a com- 
bined total of 249,941 high-quality reads and approximately 
89 Mbp of sequence. The majority of sequences in each library 
assembled into contigs (69% and 78% in the 2009 and 2010 librar- 
ies, respectively). Approximately 54% of the total sequences were 
most similar to those of RNA viruses, 4% appeared to derive from 
cells, and 42% could not be assigned. Of the sequences that were 
identified as viral, >97% were most similar to those of +ssRNA 
viruses (95% specifically to members of the order Picornavirales). 
The remaining 3% were most similar to those of dsRNA viruses, 
with the majority having similarity to Micromonas pusilla reovirus 
(MpRV), the sole member of the genus Mimoreovirus. 

Buoyant densities of marine picornavirad-like viruses. 
Buoyant density gradients are frequently used to purify viruses 
from natural assemblages for analysis (29-31), but there are few 



reports describing the density distribution of uncultivated assem- 
blages of DNA-containing viruses (32-34) and none for the RNA- 
containing viruses. This information is useful for optimizing pu- 
rification strategies (29). Although picornavirads dominated our 
metagenomic libraries, the libraries were prepared from pooled 
fractions representing a specific density range (1.38 to 1.53 g 
ml -1 ), which was chosen conservatively based only on the distri- 
bution of total RNA. To better understand the density distribution 
of marine picornavirads, we analyzed each fraction separately for 
the entire gradient ( < 1 .2 to s 1 .6 g ml~ 1 ) , amplifying by PCR first 
with degenerate primers targeting two subclades of marine picor- 
navirads and then with primers designed to specifically target the 
RdRp genes of a putative high-buoyant density phylotype and a 
putative low-buoyant density phylotype. Amplification with the 
degenerate primers resulted in variable amplicon yields that de- 
pended on which fraction of a CsCl buoyant density gradient was 
assayed. The patterns were similar for the two subclades assayed, 
with both showing a peak in the 1.45 g ml -1 fraction (Fig. 1). The 
distribution was somewhat broader for subclade 1 , the primers for 
which were also found to capture a broader range of phylogenetic 
diversity (Culley and Steward [18]). Clone libraries prepared from 
subclade 1 RdRp amplicons derived from one of the lower-density 
fractions (1.38 g ml -1 ) and one of the higher-density fractions 
(1.49 g ml -1 ) on either side of the main amplification peak re- 
vealed that some sequences were present in both libraries but that 
others were detected only in one library or the other (see Fig. S 1 in 
the supplemental material). Reverse transcription-quantitative 
PCR (RT-qPCR) using primers designed to target one of the se- 
quences found only in the lower-density library (phylotype A) or 
one of the sequences found only in the higher-density library 
(phylotype B) revealed target distributions consistent with the 
clone library results (Fig. 1). Specifically, the concentration of 
phylotype A was much higher than that of phylotype B in the 
1.38 g ml -1 fraction and the concentration of phylotype B was 
higher than that of phylotype A in the 1 .49 g ml - 1 fraction. In both 
cases, however, the distribution of target was bimodal, with a local 
maximum in or near the density fraction from which the sequence 
derived but an overall maximum occurring in the 1.43 g ml -1 
fraction (Fig. 1). 

The reason for the bimodal peaks observed for both of the 
specific phylotypes assayed is unknown. The positions of the 
clearly separated minor peaks make sense, considering the criteria 
used to choose the targets, but the presence of major peaks for 
both targets in the same intermediate density fraction is curious. 
The pattern does not appear to be a result of nonspecific amplifi- 
cation, since melting curves of the amplicons indicated a single 
narrow peak at the same melting temperature for all fractions for 
a given primer set. Differences in total RNA levels among the 
fractions could have influenced the efficiency of the RT reactions 
(35), but the offset of the RT-qPCR peaks and the total RNA peak 
suggest this cannot be the sole explanation for the observed dis- 
tributions. Two alternative explanations are that (i) identical tar- 
get sequences are found in viruses that differ in their levels of 
buoyant density or (ii) many of the viruses of each phylotype were 
aggregated with each other, with other viruses, or with some other 
material that altered their equilibrium buoyant density, a phe- 
nomenon that has been observed previously (34). 

Regardless of the details of the phylotype-specific distribu- 
tions, the amplification data suggest that the buoyant density 
distribution of picornavirads was primarily in the density range 
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FIG 1 Distribution of picornavirad-like viruses in a CsCl buoyant density gradient. (Upper panel) The total RNA from the 2009 sample measured in each 
fraction of a buoyant density gradient (redrawn using data from reference 4) is shown along with the average amplification signal from endpoint PCR with 
degenerate primers targeting marine picornavirad-like viruses. The amplification signal value (in arbitrary relative fluorescence units) was determined by image 
analysis of the amplicons on an agarose gel and is presented as the running average of the signal for two sets of primers broadly targeting marine picornavirad-like 
subclade 1 (Mplscl) and Mplsc2. (Lower panel) The copy numbers of two specific RdRp phylotypes that were identified by cloning and sequencing of amplicons 
from two of the fractions as determined by RT-qPCR. The shaded regions in both panels indicate the fractions used to create clone libraries and are labeled 
according to the phylotype (A or B) that was found only in that library. 



from 1.35 to 1.5 g ml -1 . The distribution was similar to that of 
total RNA, the most notable exception being the second peak in 
total RNA at high densities (ca. 1.6 g ml -1 ), which is not accom- 
panied by an increasing signal for picornavirads (Fig. 1). This 
suggests that the RNA in the denser fractions is qualitatively dif- 
ferent from that in the primary RNA peak and may not be of viral 
origin. 

RdRp viral diversity in the metagenome. A search for all likely 
RdRp sequences (i.e., those that contained, at a minimum, the 
same two of seven conserved motifs) returned 531 (517 
picornavirad-like and 14 reovirid-like) sequences in the 2009 li- 
brary and 300 (292 picornavirad-like and 8 reovirid-like) in the 
2010 library. A subset of these sequences that were longer and 
contained four of the seven conserved RdRp motifs were analyzed 
phylogenetically. These longer sequences (51 picornavirad-like 
and 3 reovirid-like sequences in 2009; 21 picornavirad-like and 3 
reovirid-like sequences in 2010) formed large clusters with other 
environmental RdRp sequences, although some of these clusters 
were not well supported (maximum-likelihood support values 
< 80) and most were related only distantly to RdRp sequences 
from cultivated representatives (Fig. 2). One well-supported clus- 
ter (with a maximum-likelihood support value of 100) grouped 9 
environmental sequences with two viruses (CloRNAVl [Cylin- 
drotheca closterium RNA virus 01] and CcloRNAV2) that infect a 
species of centric diatom. Of the six longer reovirid-like RdRp 
sequences analyzed, three formed a well-supported cluster (boot- 
strap value of 92) with MpRV, the only classified reovirid known 
to infect a marine protist (Fig. 3). Two other sequences formed a 



cluster distantly related to other genera within the family, and one 
sequence clustered closely (maximum-likelihood support value = 
100) with ESRV (Eriocheir sinensis reovirus), a pathogen of the 
Chinese mitten crab. 

Extrapolation from the frequency distribution of unique RdRp 
amino acid sequences using the mean Chao 1 estimator suggested 
that there were on the order of 600 to 1,000 phylotypes (95% 
confidence interval [CI] from 500 to 1,500) if extrapolating from 
the shorter sequences (2009 and 2010 samples) and around 400 
sequences (95% CI from 200 to 1,000) if extrapolating from the 
longer sequences (2009 sample only). To put the sequence diver- 
sity into a taxonomic context, we clustered the sequences to a 
conservatively defined species level (>68% amino acid [aa] iden- 
tity) (18), which resulted in a minimum of 39 and 21 different 
clusters in the 2009 and 2010 libraries, respectively, with five of 
those appearing in both libraries. Extrapolation using the Chao 1 
estimator for the 2009 sample resulted in an estimated species- 
level richness of 300 (95% CI from 100 to 900). This analysis could 
not be done for the 2010 sample, or for any of the reovirus-like 
sequences, because all of the sequences retrieved were unique in 
those instances. 

None of the RdRp nucleotide sequences that we obtained from 
Kane ' ohe Bay were identical to sequences previously derived from 
waters of coastal British Columbia. Nor were there identical nu- 
cleotide sequences shared between the 2009 and 2010 samples. 
However, three of the nucleotide sequences from the 2009 sample 
in this study were identical to sequences from a prior sampling of 
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FIG 2 Mpl-region RdRp phylotypes. An unrooted maximum-likelihood (ML) tree based on an alignment of the translated RdRp region targeted by the 
degenerate Mpl primer sets described by Culley and Steward (18) is shown. Included in this analysis are environmental sequences (black branches) that are 
comprised of sequences from this study, as well as from three previous studies published by Culley and Steward (17, 18, 25), and homologous sequences from 
representative viruses from the established taxa (orange branches) in the order Picornavirales. The specific sequences in the alignment are listed in Table SI in the 
supplemental material. The scale bar is the equivalent of 0.3 substitutions per site. The statistical support values shown are percentages calculated by the aLRT 
method. Dashed branches represent clades with support values greater than 80. CcloRNAV, Cylindrotheca closterium RNA virus 01 and 02. 



Kane'ohe Bay in 2006. After translation, five 2009 RdRp phylo- 
types were identical to phylotypes from the 2010 sample. 

The RdRp amino acid sequences in our libraries spanned the 
distance represented in previous targeted gene surveys (17, 18) 
and include some new, deeply branching groups. Most sequences 
did not cluster near sequences from the few cultivated represen- 
tatives. Since viral RdRp sequences tend to cluster based on host 
phylogeny (18), the phylogenetic distances among the RdRp se- 
quences suggests that there are a great many protists from diverse 



clades in seawater that are being lysed by RNA viruses at any given 
time. 

Assembly and analysis of picornavirad-like genomes. Six 

complete or near-complete genomes were assembled from the 
metagenomic libraries (Fig. 4). These contigs (KB2009_con55, 
KB2009_conl5, KB2009_con28, KB2009_con74, and 
KB2009_con88 and KB2010_conl6) ranged in size from 8,330 to 
9,465 bp (mean, 9,008 bp) and had GC contents ranging from 
36.4% to 46.8% (mean, 41.5%). Primer pairs designed to uniquely 
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FIG 3 Reovirid RdRp tree. An unrooted ML tree based on an alignment of sequences containing regions 1 to 4 conserved in the RdRp of reovirids (48) from 
this study and sequences from representative viruses from the established genera in the family Reoviridae is shown. All sequences above and below the horizontal 
line are in the subfamilies Spinareovirinae and Sedoreovirinae, respectively. The specific sequences in the alignment and the full names of the virus species 
represented by abbreviations in the figure are listed in Table SI in the supplemental material. The scale bar is the equivalent of 0.3 substitutions per site. The 
statistical support values shown are percentages greater than 80 as calculated by the aLRT method. 



amplify a region within the RdRp gene of each assembled genome 
resulted in amplicons of the expected size and predicted sequences 
when the original RNA extract was used as the template (data not 
shown). Each genome contained either one or two large open 
reading frames (ORFs) that encoded polyproteins ranging in size 
from 906 to 2,827 amino acids (Fig. 4 and Table 1). These domains 
were similar to conserved domains of the nonstructural and struc- 
tural proteins of known +ssRNA viruses in the order Picornavi- 
rales. 

Phylogenetic analysis of the full-length RdRp genes from these 
assemblies and evidence from comparative genome organization 
analyses suggested that four of the assembled genomes 
(KB2009_con28, KB2009_con74, KB2009_con55, and 
KB2010_conl6) were most similar to those of known diatom- 
infecting viruses, including the three classified members of the 
genus Bacillarnavirus (maximum-likelihood support value = 99) 
(Fig. 5). The RdRp of another near-complete genome 
(KB2009_conl5) was most closely affiliated (maximum- 



likelihood support value = 100) with members of the family 
Dicistroviridae. One other genome (KB2009_con88) was more 
divergent and did not form any well-supported clades with any 
other viruses in the analysis. 

Two of the assembled genomes (KB2009_con55 and 
KB2010_conl6) were nearly identical (99.8%). The 2009 genome 
was 361 bp shorter than the 2010 genome, missing 231 bp on the 
5' endandl30bp onthe3' end, presumably a result ofincomplete 
sequence coverage. The assembly of these genomes in which 
nearly all (19 of 22) differences were synonymous substitutions 
provides some confidence in the assemblies and suggests that the 
genomes derive from functionally equivalent strains. Whether 
these phylotypes coexist, or whether one replaced the other over 
time, we cannot discern from the present data. A high degree of 
genome sequence conservation over time (96% to 98% nucleic 
acid identity) was also observed among DNA-containing viruses 
over a 10-year period in coastal California (36). Although the col- 
lection dates for our samples were only 10 months apart, the data 
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FIG 4 Maps of assembled genomes. Genome maps for the 6 complete, or near-complete, genomes assembled from this study (produced in Geneious version 
6.1.7) are shown. The unit of measure for the horizontal scale bar is nucleotides. The 5' untranslated region (UTR), intergenic region (IGR), and 3' UTR are 
represented by gray arrows, and the open reading frames and a polyadenylated [poly(A)] tail are represented by orange and turquoise bars, respectively. The 
identified motifs (see Materials and Methods) are shown with arrows of different colors as follows: helicase, purple; protease, light blue; RNA-dependent RNA 
polymerase (RdRp), green; structural, wine, red, yellow, and light orange. Under each genome schematic is a graph of GC content along the genome (sliding 
window frame = 93) where % GC is shown with a blue line and % AT with a light-green one. KB2010_conl6 and KB2009_con55 are represented by the same 
map (E), because they have identical genome organizations (see Results and Discussion). A protease is presumed to be present between the helicase and the RdRP 
in all genomes, but, in some cases (A to D), there was insufficient similarity to a known protease in this region to map its location. 



suggest that there could be similar genome stability among plank- 
tonic RNA viruses. 

Each of the six assembled genomes contained a putative non- 
structural gene with the highly conserved motifs of a nucleotide 
triphosphate (NTP) -binding domain and another gene with sig- 
nificant sequence similarity to the catalytic center of a family of 
+ssRNA virus RdRps. We located a region with sequence similar- 
ity to a family of viral 3C cysteine proteases in only two of the 
genomes (KB2009_con55 and KB2010_conl6). The syntenous re- 
gions of the other genomes were of similar sizes but had no signif- 
icant similarity to known proteases. Since this enzyme is critical 
for the reproduction of all known + ssRNA viruses, we believe that 
the syntenous regions in the other genomes also encode proteases 
that are highly divergent from known proteases. The structural 
genes (3 to 4 per genome) were homologous to the capsid-binding 
site of picornaviruses and to the VP4 and capsid proteins of dicis- 
troviruses. In the untranslated regions (UTR), no similarities were 
found to any experimentally verified internal ribosomal entry site 



(IRES) structures. Poly(A) tails were present at the 3' end of four 
of the six assembled genomes. 

Of the nine genomes that have been assembled from marine 
RNA metagenomes — six from this study and three from a previ- 
ous study ( 1 8 ) — all but one are related to the genomes of viruses in 
the order Picornavirales, which is consistent with the relatively 
high frequency of picornavirad-like RdRp sequences in the librar- 
ies. The assembled genomes from this study share several charac- 
teristics with picornavirads. These include a monopartite or bi- 
partite genome, the helicase-protease-replicase nonstructural 
gene order, and a poly(A) tail (37). The nonstructural gene cas- 
sette is closest to the 5' end and is followed by a gene block of 
structural genes. This configuration is similar to the gene order of 
viruses in the genus Bacillarnavirus and families Dicistroviridae 
and Marnaviridae, all taxa within the Picornavirales. 

The number of sequences recruiting to various contigs pro- 
vides some clues about the composition of the RNA viral commu- 
nity (Table 1). Based on the predicted size and the average cover- 
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TABLE 1 Summary information on assembled genomes" 

Avg 











coverage 




ORF 1 


ORF 1 


ORF 2 


ORF 2 


5'UTR 


IGR 


3' UTR 












Genome 




(no. of 


No. of 


size 


size 


size 


size 


size 


size 


size 




Poly(A) 


%of 


Genome name 


Sample 


size (bp) 


%GC 


reads) 


ORPs 


(bp) 


(aa) 


(bp) 


(aa) 


(bp) 


(bp) 


(bp) 


% UTR 


tail 


total 6 


KB2010_conl6 


2010 


9,465 


38.9 


529 


2 


5,388 


1,796 


2,718 


906 


773 


296 


289 


14.3 


Y 


13 


KB2009_con55 


2009 


9,104 


38.7 


115 


2 


5,388 


1,796 


2,718 


906 


773 


296 


289 


14.9 


Y 


2 


KB2009_conl5 


2009 


9,387 


46.8 


169 


1 


8,481 


2,827 


NA 


NA 


585 


NA 


298 


9.4 


Y 


3 


KB2009_con28 


2009 


9,264 


45.3 


259 


1 


8,268 


2,756 


NA 


NA 


904 


NA 


161 


11.5 


ND 


5 


KB2009_con74 


2009 


8,500 


42.6 


131 


2 


5,415 


1,805 


2,718 


906 


61 


259 


47 


4.3 


ND 


2 


KB2009_con88 


2009 


8,330 


36.4 


81 


1 


8,127 


2,709 


NA 


NA 


28 


NA 


138 


2.0 


Y 


1 



rt Estimated minimum genome size, % GC content, average coverage, the sizes of open reading frames (ORFs) and untranslated regions (UTRs), and whether a poly(A) tail was 
evident are presented for each of the six genomes assembled from the metagenomic libraries. IGR, intergenic region; NA, not available; ND, not detected; Y, yes. 
b Listed in this column are the percentages of total bp assigned to a genome calculated by multiplying the predicted size of the genome (in bp) by the average genome coverage value 
and dividing by the total no. of bp generated in the library ( X 100). 



age for the assembled genomes and the total number of 
nucleotides sequenced in the library, the recruitment data imply- 
that the community structures of the RNA viruses in these samples 
differ. For example, on the basis of this type of analysis, we esti- 
mate that KB20 1 0_con 1 6 represents 1 3 % of the total RNA viruses 
(Table 1). However, these estimates of dominance based on re- 



cruitment have considerable uncertainty because of the possibility 
of bias in the production of the metagenomic library as discussed 
below. 

Comparison of RNA viral metagenomes. In reciprocal BLAST 
analyses of the two libraries, 76% of the 2009 sequences had sig- 
nificant similarity {E value < 10 -5 ) to sequences from the 2010 
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FIG 5 Tree of the RdRp of assembled genomes. A rooted ML tree in which the tree branches have been transformed based on an alignment of the conserved 
regions of the RdRP (14) from the assembled genomes from this study, homologous sequences from representative viruses from the established taxa in the order 
Picornavirales (highlighted in color), CcloRNAVOl and CcloRNAV02, viruses that infect a pennate diatom, and JP-A and JP-B, putative viral genomes assembled 
from a marine metagenome, is shown. The specific sequences in the alignment and the full names of the virus species represented by abbreviations in the figure 
are listed in Table SI in the supplemental material. The statistical support values shown are percentages greater than 80 as calculated by the aLRT method. The 
scale bar represents 0.4 substitutions per site. Adjacent to each picornavirad taxon and environmental sequence is a schematic, based on a study by Le Gall et al. 
(16), that shows the gene order and genome organization for each genome type. The symbols are as follows: a black circle = "VPg" = genome-linked protein, 
"Hel" = helicase, "Pro" = protease, "Pol" = polymerase, "CP" = capsid protein, a black line = untranslated region, and "An" = poly(A) tail. A question mark 
indicates that we detected no similarity to known protease sequences but that we presume that one is encoded. 
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library and 84% of the 2010 sequences had significant similarity to 
those from 2009. An intercomparison of these Kane'ohe Bay met- 
agenomes and two other aquatic metagenomes targeting RNA vi- 
ruses, one from coastal British Columbia (25) and one from an 
artificial lake in Maryland (24), revealed that sequences most sim- 
ilar to +ssRNA viral genomes outnumbered those matching 
dsRNA viral genomes in all cases (Fig. 6). Furthermore, no am- 
bisense, — ssRNA, or retroviral sequences were detected in any of 
the libraries. 

Most of the +ssRNA in the marine samples, but not those in 
the freshwater sample, were assigned to the order Picornavirales. 
Among the picornavirad-like sequences, those with similarity to 
sequences of dicistrovirids and JP-B (a putative viral genome of 
unknown affiliation assembled from a marine sample) were iden- 
tified in all four of the metagenomes. Sequences related to the 
diatom-infecting bacillarnaviruses were common in the Kane' ohe 
Bay libraries but were not detected in the coastal British Columbia 
or freshwater libraries. The relative high representation of 
bacillarnavirus-like sequences in the libraries from Kane'ohe Bay 
is consistent with the importance of diatoms in this system, 
blooms of which often dominate the eukaryotic phytoplankton 
community (38). 

Also notable was the detection, only in the freshwater library, 
of sequences most similar to those of viruses that infect land plants 
(e.g., tombusvirids, sobemovirids, and members of the Virgaviri- 



dae) or insects (iflavirids). The detection of sequences similar to 
those of plant and insect viruses in the lake, but not the sea, might 
be attributable in part to viruses entering this shallow retention 
basin in terrestrial runoff (24). However, many of the novel virus 
sequences recovered may derive from uncharacterized viruses that 
infect benthic or planktonic freshwater organisms. 

The predominance of +ssRNA virus-like sequences in all of 
the libraries analyzed may reflect a higher relative abundance of 
these types of RNA viruses in aquatic environments, but we can- 
not yet rule out biases from the steps involved in library prepara- 
tion. The amplification method we used in this study was found to 
introduce little intragenomic bias when used to sequence individ- 
ual viruses having a variety of genome configurations (39), but 
there is no data on the relative efficiencies with which ssRNA 
versus dsRNA genomes are recovered in a mixture of the two. A 
hypothesis that has yet to be tested is that the apparent low repre- 
sentation of dsRNA viruses in all of the libraries reflects interfer- 
ence from the reannealing of complementary strands during re- 
verse transcription. Even if this were found to be true, it does not 
explain the dearth of sequences similar to those of other viruses 
having single-stranded RNA genomes (negative sense, ambisense, 
retroviral). Ultimately, quantitative assays of the major viral 
groups identified by metagenomic analyses will be needed to con- 
firm the apparent dominance of picornavirads among marine 
RNA viruses. 
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Conclusion. The six genomes assembled as part of this study 
represent a significant increase in the number of marine picorna- 
virad genome sequences available and will be useful for designing 
future experiments to test the ecological contributions of these 
viruses. 

The apparent dominance of picornavirad-like sequences in 
our samples from coastal tropical waters is consistent with a pre- 
vious metagenomic analysis of RNA viruses in coastal temperate 
waters (25), despite the use of very different methods in the two 
studies for each of three major steps: viral harvesting (ultrafiltra- 
tion versus flocculation), amplification (sequence-independent 
single-primer amplification [SISPA] versus linker ligation), and 
sequencing (454 versus Sanger). This suggests that, as docu- 
mented for metagenomes of DNA viruses (31, 40), the results are 
robust with respect to at least some methodological biases and that 
picornavirads may generally dominate the pool of RNA viruses in 
the ocean (but perhaps not in freshwater). However, quantitative 
analyses of additional marine habitats (e.g., polar, open ocean) 
and a thorough evaluation of potential biases during reverse tran- 
scription are needed to either bolster or banish this incipient par- 
adigm. 

MATERIALS AND METHODS 

A description of the study site and methodological details concerning 
sample collection, processing, and construction of the metagenome was 
provided in a companion study (4). Summaries of those elements are 
provided here, as a convenience to the reader, along with more-detailed 
descriptions of the analyses specific to this paper. 

Sample collection and processing. Surface seawater (<0.5-m depth) 
was collected on 1 August 2009 (35 liters) and 3 lune 2010 (40 liters) in 
polycarbonate carboys from a pier in Kane'ohe Bay, Hawai'i 
(21°25'46.80"N, 157°47'31.51"W), a reef-protected, subtropical embay- 
ment located on the windward side of O'ahu. Samples were filtered 
(Sterivex; Millipore) (0.22 u,m pore size) to remove cells and larger par- 
ticles, and then viruses in the filtrate were concentrated by chemical floc- 
culation and filtration (41) followed by centrifugal ultrafiltration (Ami- 
con 15; Millipore) (30 kDa). Viruses from concentrated, virus-enriched 
samples were purified using sequential step and continuous CsCl buoyant 
density gradients (29). Fractions of approximately 0.5 ml were collected 
from the continuous gradient (22 to 23 fractions per gradient). After 
buffer exchange into TE (10 mM Tris, 1 mM EDTA, pH 8), nucleic acids 
were extracted from a portion of each density fraction (QIAamp MinElute 
Viral Spin kit; Qiagen). Each nucleic acid extract was treated with DNase 
to remove any copurified DNA, and the RNA concentration was deter- 
mined by fluorometry (4). 

Amplification, cloning, and sequencing of RdRp genes. Reverse- 
transcription PCR (RT-PCR) with two sets of degenerate primers target- 
ing marine picorna-like virus subclades 1 and 2 (Mplscl and Mplsc2; see 
Table S2 in the supplemental material) was performed with RNA template 
from each buoyant density fraction according to the protocol described by 
Culley and Steward (18). The resulting endpoint PCR products were sep- 
arated on a 1% agarose gel and visualized on a digital gel documentation 
system. The intensity of PCR amplification in each fraction was measured 
using Molecular Imaging Software (Kodak). Amplified products from the 
1.38 g ml -1 and the 1.49 g ml -1 buoyant density fractions were excised 
from the gel and purified separately (MinElute Gel Extraction kit; Qia- 
gen). The ends of the purified products were repaired (PCRTerminator 
End Repair kit; Lucigen) and ligated into the pSMART-HCKan vector 
(Lucigen). Ligated vector was transformed into Ecloni 10G Supreme cells 
(Lucigen) via electroporation using the supplier's recommended condi- 
tions. Clones were screened for insertions by PCR amplification, and 
products in the correct size range were purified and sequenced by Sanger 
sequencing with fluorescent dye terminators (Applied Biosystems). 



Quantification of RdRp phylo types in the buoyant density gradient. 

Reverse transcription quantitative PCR (RT-qPCR) was used to deter- 
mine the abundances of two RNA virus phylotypes identified in the 2009 
sample. Primers were designed to target regions of the RdRp unique to 
each phylotype (see Table S2 in the supplemental material). Reactions 
were performed with RNA template from each of the RNA viral buoyant 
density fractions from the 2009 sample. Reaction mixtures for cDNA syn- 
thesis (Superscript III; Invitrogen Corporation) consisted of 5 u.1 of the 
extracted, DNase-treated RNA template, a 0.2 mM concentration of each 
deoxynucleoside triphosphate, and a 0.5 fxM concentration of reverse 
primer. Samples were denatured at 65°C for 5 min and cooled on ice and 
then supplemented with IX First-Strand Buffer (Invitrogen Corpora- 
tion), 5 mM dithiothreitol, 40 U RNase (RNaseOUT; Invitrogen Corpo- 
ration), and 200 U reverse transcriptase (Superscript III; Invitrogen Cor- 
poration) to obtain a final reaction mixture volume of 20 /id. The reaction 
mixtures were brought to 55°C for 60 min and to 70°C for 1 5 min as a final 
termination step and were then supplemented with 1 u.1 RNase H (Invit- 
rogen Corporation) and incubated for 20 min at 37°C. The qPCR ampli- 
fication was performed on a 7300 real-time PCR system (Applied Biosys- 
tems) with Power SYBR green PCR Mastermix (Applied Biosystems). The 
reaction mixtures contained 12.5 u.1 SYBR green PCR master mix (Ap- 
plied Biosystems), a 0.2 uM concentration of each primer, and 2 /aI of 
sample cDNA template, with a final volume of 25 fil. For each primer set, 
reactions were replicated two times, and each set of reaction mixtures 
contained duplicate samples, standards, and negative controls. Standards 
consisted of 10-fold serial dilutions (3 X 10 1 to 3 X 10 9 molecules per 
reaction) of target molecule that had been cloned, amplified using appro- 
priate primers, purified by agarose gel electrophoresis, and extracted with 
a MinElute Gel Purification kit (Qiagen). Quantities of DNA were deter- 
mined fluorometrically with a Quant-iT DNA Assay kit (Invitrogen Cor- 
poration). The thermal cycling protocol consisted of a denaturation at 
95°C for 10 min, followed by 40 cycles of denaturation at 95°C for 15 s, 
annealing at the primer-specific temperature (see Table S2) for 30 s, ex- 
tension at 72°C for 35 s, and a final extension at 72°C for 14 min and 25 s. 
The specificity of each primer set was verified by an analysis of the DNA 
melting curve of the amplification products over the course of the reaction 
as well as independent experiments in which the amplification efficiency 
of the primers was determined in samples spiked with various amounts of 
nontarget template. 

Metagenome construction. For each sampling date, RNA purified 
from density fractions in the range from 1.38 to 1.49 g ml -1 was pooled 
and subjected to a random-priming-mediated sequence-independent 
single-primer amplification (RP-SISPA) reaction (39, 42). After size se- 
lection (500 to 1,000 bp) on an agarose gel, RP-SISPA products were 
sequenced with a GS FLX Titanium platform (Roche Diagnostics Corpo- 
ration). Sequences were run through a quality-control pipeline (43) to 
remove short and low-quality reads and presumed artificial replicates, and 
the ends were trimmed to remove any remaining primer sequences. Rar- 
efaction curves for each RNA viral metagenome were generated in META- 
VIR (44) using equal samplings of the unassembled, quality-controlled 
sequences and a clustering percentage of 75. 

Identification and analysis of RdRp genes in the metagenome. A hid- 
den Markov model was built and used to identify RdRp-like sequences in 
each of the metagenomes using HMMER v 3.0 (45). Markov models were 
first produced based on amino acid alignments of conserved RdRp re- 
gions (46) of representative picornavirad-like viruses and conserved 
RdRp regions (47) of representative reovirids (see Table SI in the supple- 
mental material). Sequences fitting the criteria of the models were re- 
trieved from the libraries translated into all six frames. RdRp sequences 
that matched a known RdRp gene by BLAST with an E value £ 10~ 3 were 
considered to be significant. We used this approach to identify (i) se- 
quences that contained the smallest, still-identifiable region of the RdRp 
(conserved domains 6 to 7) (14), (ii) picornavirad-like sequences in the 
libraries that contained the RdRp regions targeted by Mpl primers, which 
includes conserved domains 4 to 7 (14), and (iii) sequences containing 
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regions 1 to 4 conserved in the RdRp of reovirids (48). Translated 
picornavirad-like RdRP sequences were clustered at the putative species 
level based on a conservative phylogenetic distance criterion as previously 
described (18). In essence, the greatest distance between any two officially 
classified strains of picornavirads belonging to the same species (Human 
Rhinovirus 2 and Human Rhinovirus 89) was taken as the species thresh- 
old. Any sequences whose distance from one another was lesser than the 
distance represented by that threshold were considered members of the 
same species and those whose distance from one another was greater than 
the distance represented by that threshold were considered to be different 
species. The Chao 1 estimator (49) of total phylotype or species richness 
was calculated with Estimates (50) . In light of the large uncertainty in the 
estimator, values were rounded to the nearest hundred. 

Maximum-likelihood trees were constructed with PhyML (51) from 
protein sequences aligned with MAFFT (52) using the auto function. 

Sequence assembly and library comparison. Sequences were assem- 
bled using CLC Genomics Workbench version 5.0 and the following pa- 
rameters: global alignment, a minimum contig length of 200, mismatch, 
insertion, and deletion costs set to 3, length fraction set to 0.5, similarity 
threshold set to 0.8, automatic word value set to 20, and bubble size set to 
50. Assembly statistics were presented elsewhere (4). Contigs and single- 
tons from the assembled libraries were classified with MEGAN (53) based 
on blastx (54) searches of the NCBI nonredundant nucleotide database 
where hits with an E value < 10~ 5 were considered significant. We used 
this conservative, but frequently used, cutoff value to further reduce the 
likelihood of the misclassification of a sequence. All of the individual reads 
comprising a given contig inherited the taxonomic assignment given to 
the contig. 

We compared the KB libraries by BLAST analysis (blastx; E value, 
< 10~ 5 cutoff) where the unassembled reads from one library were used to 
query the second library and vice versa. 

Analysis of assembled genomes. Open reading frames were identified 
with the heuristic approach for gene prediction described by Besemer and 
Borodovsky (55). Searches of the Conserved Domain Database (CDD) 
NCBI database (56) were conducted with the translated ORFs from each 
genome (Table 1 ) . Searches of a database of experimentally verified inter- 
nal ribosomal entry site (IRES) structures (57) were conducted with the 
untranslated regions (UTR) of the six genomes. 

Accession numbers. Metagenomic data referred to in this paper are 
available at the CAMERA website (http://camera.calit2.net/) under ac- 
cession number CAM_PROJ_BROADPHAGE and sample names CAM 
_SMPL_0008 15(1 August 2009) and CAM_SMPL_000824 (3 June 2010). 
The accession numbers of the MPL phylotypes described in this research 
are listed in Table S 1 in the supplemental material. 

SUPPLEMENTAL MATERIAL 

Supplemental material for this article may be found at http://mbio.asm.org/ 
lookup/suppl/doi:10.1128/mBio.01210-14/-/DCSupplemental. 

Figure SI, EPS file, 0.7 MB. 

Figure S2, EPS file, 2.3 MB. 

Table SI, PDF file, 0.4 MB. 

Table S2, PDF file, 0.1 MB. 
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